Signal based vetoes for the detection of gravitational waves from inspiralling compact 

binaries. 
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The matched filtering technique is used to search for gravitational wave signals of a known form in 
the data taken by ground-based detectors. However, the analyzed data contains a number of artifacts 
arising from various broad-band transients (glitches) of instrumental or environmental origin which 
in : can appear with high signal-to-noise ratio on the matched filtering output. This paper describes 

several techniques to discriminate genuine events from the false ones, based on our knowledge of the 
signals we look for. Starting with the \ 2 discriminator, we show how it may be optimized for free 
parameters. We then introduce several alternative vetoing statistics and discuss their performance 
■ using data from the GEO 600 detector. 

o 

I. INTRODUCTION 

The first generation of gravitational wave detectors is either already online and gathering scientific data (LIGO 0] , 
^ ', GEO 600 0], TAMA Q) or about to start taking data (VIRGO 0). LIGO and GEO 600 have successfully completed 
several short data taking runs (so called science runs) in coincidence @, |(| . TAMA has accumulated over 2000 hours 
of data 0,0 and quite a big portion of this data was taken in coincidence with LIGO and GEO 600. All detectors 
£N) ■ are currently in the commissioning stage and are steadily approaching their design sensitivities. Improvements in 
the performance of the detectors are carried out in several directions: (i) sensitivity improvements (tracing and 
reducing noise level from different subsystems) (ii) increasing duty cycle (time spent in acquiring the data suitable for 
astrophysical analysis as a fraction of the total operational time) , and (iii) improving the data quality (stationarity) . 

However, at the present state the data is neither stationary nor Gaussian over time scales greater than few minutes. 
The detector output contains various spurious transient events. Unfortunately, the output of an optimal filter reflects 
these events, especially various glitches. By glitch here we mean a short duration spurious transient (of almost 
delta-function shape) with a broad band spectrum that leads to a high signal-to-noise ratio (SNR) at the output of 
matched filtering. Distinguishing these events from the real events of astrophysical origin and dropping them out of 
consideration is called vetoing. In addition to the main gravitational wave channel, interferometers record a large 
volume of auxiliary data from environmental monitors and various signals from the many detector subsystems. These 
monitors help to find correlations between abnormalities in environmental or in instrumental behaviour and events in 
the strain channel with high SNR. The transients which correlate both in the strain and auxiliary channels (occure in 
both within a coincidence window) can be discarded on the ground of noise coupling between the strain channel and 
detector's subsystems (provided we understand the physical reasons for such a coupling mechanism). This is what is 
regarded as instrumental vetoes. The instrumental vetoes are helpful for removing some fake events, however, it is not 
enough. We have other events which are of artificial nature, but the information which would help us to remove these 
events either was not recorded or is not recognised. So in addition to instrumental vetoes, we need to apply signal 
based vetoes: vetoes which are based on our knowledge about a signal's shape in the frequency- and/or time-domain. 
For signal based vetoes, we need to construct a statistic which helps us to discriminate false signals from the true 
ones. The x 2 time-frequency discriminator suggested in is an example of such a statistic. This vetoing statistic 
is used in a search for gravitational waves from the binary systems consisting of two compact objects (Neutron Stars 
(NS), Black Holes (BH),...) orbiting around each other in an inspiralling trajectory due to loss of orbital energy and 
angular momentum throu gh grav itational radiation. A lot of effort has been put into modeling the waveform from 
coalescing binaries 

ESmilllll- The waveforms (often referred to as chirps) are modelled with reasonable accuracy, 
so that matched filtering can be employed to search the data for these signals. In the case of the \ 2 discriminator, 
we use the time-frequency properties of the chirp in order to discard (to veto out) any spurious event which produces 
an SNR above a preset threshold on the matched filter output. The performance of x 2 might depend on the number 
of bins used in computing the statistic. 

In this paper we suggest a possible way to optimize the \ 2 discriminator for the number of bins. We use software 
injections (adding simulated signals) into data taken by the GEO 600 detector during the first science run (SI) in 
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order to study the distribution of the x 2 statistic for simulated signals and for noise-generated events. The optimal 
number of bins is the one which maximizes detection probability for a given false alarm rate. This method is quite 
generic and can be used for tuning any vetoing statistic which depends on one or several parameters. 

Though the x 2 discriminator works reasonably well, it is still desirable to have additional independent signal based 
vetoes, which would either increase our confidence or improve our abilit y to separate genuine events from spurious 
ones. Some investigations have been already made in this direction 0, L1J]- I* 1 addition to signal based vetoes, a 
heuristic veto method was suggested in |15| . It is based on counting the number of SNR threshold crossings within a 
short time window. 

In this paper we suggest several new signal based statistics which can compliment y 2 or enhance its performance. 
We introduce a statistic inspired by the Kolmogorov-Smirnov "goodness-of-fit" test pjj, we call it the ci-statistic. We 
derive its probability distribution function in the case of signals buried in Gaussian noise. We have also suggested a 
few other x 2 -like and d-like statistics and show that their combination could increase vetoing efficiency even further. 

Throughout the paper we have used the following assumptions and simplifications. We shall assume that the 
waveforms used in our simulations, "Taylor" approximants (tl) at second Post- Newtonian order in the notations used 
m are the exact representation of the astrophysical signal. The study performed in this paper is not restricted by 
the waveform model and could be repeated for any other model at the desirable Post-Newtonian order. The waveforms 
depend on several parameters, some of these parameters are intrinsic to the system like the masses and spins, while 
others are extrinsic like the time and phase of arrival of the gravitational wave signal. To search for such signals we 
use a bank of templates, which can be seen as a grid in the parameter space |17| . Separation of templates in the 
parameter space is defined by the allowed loss in the SNR (or equivalently by a loss in the detection probability). 
The detector output is usually filtered through a bank of templates for parameter estimation (l8l IT^ | . For the sake of 
simplicity we have used a single template with parameters identical, or very close, to those of the signal used in the 
Monte-Carlo simulation described in Section ITTT1 

This paper is structured as follows. We start in Section |n] by recalling the widely used 0,113 X 2 time- frequency 
discriminator 0. In Section ITTT1 we describe the method to optimize the x 2 ve t° f° r the number of bins. Though 
we show its performance for y 2 optimization, the method is applicable to any discriminator which depends on some 
free parameters. Section ITVl is dedicated to alternative vetoing statistics. There we start with the ci-statistic, then 
we show few more examples of d- and x 2 -like statistics (d and f 2 correspondingly) which can potentially increase the 
vetoing efficiency further. For instance we show that the combination of d and f 2 statistics (namely their product) 
give the best performance for a day's worth GEO 600 data. We summarize main results in the concluding Section Ivl 
and some detailed derivations are given in Appendix lAl 



In this Section we introduce the notation which will be used throughout the paper and we reformulate the \ 2 
discriminator using new notations. This should be useful in the following sections where we discuss x 2 optimization 
and alternative signal-based vetoing statistics. 

Throughout this paper we assume that the signal is of a known phase with known time of arrival without loss of 
generality. Indeed, we can use phase and time of arrival taken from the maximization of SNR. Alternatively, one can 
extend the derivations below in a manner similar to to deal with the unknown phase. 

The detector output sampled at tj = jAt is denoted by x{tj) — n(tj) + As(tj), where n(tj) is noise and s(tj) is a 
signal, which corresponds to the gravitational wave of amplitude A. Since we will be working mainly in the frequency 
domain, we use tilde-notation for a Fourier image of the time series: x(fk) — n(fk) + As(fk)- The discrete Fourier 
transform is defined as 



II. CONVENTIONS AND x 2 DISCRIMINATOR 
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where fk — M ^ t , and M is a number of points. 

In order to introduce the x 2 discriminator we need to define the following quantities 




(1) 




3 



Note that this notation is different from that used in Here the one-sided noise power spectral density (PSD), 
S n (fk), defined as 

^S n (\fk\)6kk> =E(n(f k )n*(f k ,)), 

is assumed to be known, c.c. as well as "*" mean complex conjugate and A/ = ^Af ^ e nave chosen to work with 
discrete time and frequency series to be close to reality. Here and after we use E(...) for the average over ensemble 
and var(...) for the second moment of the distribution. The frequency boundaries F p, Fn correspond to the frequency 
at which the gravitational wave signal enters the sensitivity band of the instrument |27| and the frequency at the last 
stable orbit, Fn = fiso |29 (sometimes it is also referred to as the frequency at the innermost stable circular orbit). 
In this notations Q corresponds to the SNR (up to a numerical factor which does not play any role in the further 
analysis) and Qi is a part of the total SNR accumulated in the frequency band between and Fj. We choose a 
normalization for the templates so that S = 1. Let us emphasize again, that we have assumed that we know the phase 
and time of arrival, so they are incorporated in the definition of the waveform s(U). 

For the \ 2 discriminator, we choose the frequency bands (bins) F%, . . . , Fn, so that there is an equal power of signal 
in each band: Si = S/N = 1/N. Then the x 2 discriminator can be written in the notations adopted here as follows 



JV 

x 2 = nY j (Q 1 -Q/n) 2 . (3) 

fc=l 

If the detector noise is Gaussian, then the above statistic obeys a \ 2 distribution with N — 1 degrees of freedom. The 
main idea behind the \ 2 discriminator is to split the template s(f) into sub-templates defined in different frequency 
bands, so that if the data contains the genuine gravitational wave signal, the contributions (Qi) from each sub-template 
to the total SNR (Q) are equal (E(Q,) = A/N). 

In the presence of a chirp in the data or if the data is pure Gaussian noise, the value of x 2 is l° w E(\ 2 ) = N — 1. 
However, if the data contains a glitch which is not consistent with the inspiral signal, then the value of \ 2 is large. 
This statistic is very efficient in vetoing all spurious events that cause large SNR in the matched filter output. It was 
used in the search pipeline for setting an upper limit on the rate of coalescing NS binaries |6| . 

If we want to apply this vetoing statistic in a binary BH search we should do some modifications of the \ 2 
discriminator Eq. to increase its efficiency. In practice, it might be difficult to split S in bands of exactly the same 
power Si for signals from high mass systems, in other words it might be difficult to achieve Si — 1/N exactly. Indeed, 
the bandwidth of the signal from binary BH decreases with increasing total mass, fi so = l/(6 3 / 2 7rrn), where we used 
G = c = 1, and m — mi + to 2 is the total mass [29j . In addition we work with a finite frequency resolution, which 
we might want to decrease to save computational time. Finally, the accuracy of splitting the total frequency band 
depends on the number of bins. 

Based on this we suggest a modification of the x 2 discriminator, which does not change it statistical properties, 
but enhances its performance We introduce pi = Si/S which is close to 1/N, but not exactly equal to it. Then 
we should redefine \ 2 statistic according to 



v (Q l -p t Q) 2 



X 2 = Y - PlW> (4) 
~i Pi 



We refer to for more details on this modification and its properties. 



III. OPTIMIZATION OF VETOING STATISTIC 



In this section we would like to present a method for optimizing parameter-based vetoing statistics. This method 
also helps to tune the veto threshold for a signal based statistic. Though the main focus in this section will be on the 
optimization of the x 2 statistic with respect to the number of bins, this method can also be applied to a general case 
(see Section Hvl . 

First we need to define playground data. Playground data is a small subset of the available data chosen to represent 
the statistical properties of the whole data set [21|. The main idea is to use software injections of the chirps (adding 
simulated signals) into playground data and compare the distribution of \ 2 f° r the injected signals and spurious 
events. There is a trade off between the number of software injections: on the one hand we should not populate the 
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data stream with too many chirps as it will corrupt the estimation of PSD, on the other hand the number of injections 
should not be too small, so that we can accumulate sufficiently large number of samples ("sufficiently large" should 
be quantified, see [22]). Another issue is the amplitude of injected signals: the amplitude should be realistic, which 
means close to the SNR threshold used for the search. Parameters of the injected chirps (such as masses, spins, etc.) 
should be either fixed (optimization with respect to the particular signal) or correspond to the range of parameters 
used for templates in the bank. A generalization could be optimization with respect to several (group of) signals and 
the use of different number of bins for different (set of) parameters. That could happen in reality: the search for 
binary NS and binary BH might have different optimal number of bins. 

To ease our way through we give an example of the optimization of \ 2 f° r signals from the 5 — 5M Q system. We 
injected a waveform with mass parameters 5.0 — 5.0M Q and SNR=13 in each 5th segment of analyzed data. Each 
segment was 16 seconds long. Then 2.5 hours of GEO 600 SI data was filtered through the template TaylorTl (at 2-nd 
Post-Newtonian order) with mass parameters 5.04 — 5.O4M0. The template TaylorTl corresponds to "tl" in By 
having a slight mismatch in masses of the system, we have tried to mimic a possible mismatch due to the coarseness 
of the template bank. We have separated triggers which correspond to the injected signals from the spurious events 
by using a 5 msec window around the time of injection. SNR threshold was chosen to be 6. Then we have produced 
histograms for x 2 distribution for injected/detected signals and for spurious events. This procedure was performed 
for different number of bins for \ 2 statistic. One can see the results in Figure ^ The solid line histogram shows the 
distribution of % 2 for signals and the shaded histogram corresponds to the distribution of x 2 for spurious events with 
SNR > 6. 
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FIG. 1: Distribution of \ 2 f° r simulated signals (the histogram drawn by the solid line), and for spurious events in GEO 600 
SI data (the shaded histogram). 

We want the distribution of x 2 for injected signals be separated as much as possible from the distribution of x 2 
for the spurious events. The optimal number of bins is the one which corresponds to the minimum overlap between 
those two distributions. One can see that for the case considered above the optimal number lies somewhere close 
to 20. We need a more rigorous way to define the optimal number of bins, so that we need to quantify the overlap 
between the two distributions. Here we will apply the standard detection technique j^. First we need to normalize 
the distributions Pi(x,N) (corresponds to the distribution of x 2 f° r the injected signals) and P2{x,N) (corresponds 
to the distribution of x 2 for the spurious events) so that 

r + cc r+oc 

/ Pi(x)d X =h / P 2 (x)rfx=l- (5) 
Jo Jo 

P2 (x) defines the false alarm probability distribution function, so that we can fix the false alarm probability according 
to 
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P2(x,N)d X 



a. 



(6) 



By fixing the false alarm probability a, we are essentially fixing the threshold, T](N, a), on \ 2 ■ Note that the threshold 
is a function of the number of bins and the false alarm probability. For real data, r\ cannot be computed analytically, 
since P2 depends on spurious events, or, rather on the similarity of spurious events to the chirp signal. Thus the 
purpose of the playground data is to characterize the non-stationarities in the data. 

We will call the number of bins optimal if for a given a it maximizes the detection probability Pd 



T](N,a) 



P 1 ( X ,N)d X = P d . 



(7) 



In other words, 



N opt = max N I J Pi (x, N)d\ 



(8) 



Note, that we know Pi (x, N) only for chirps plus Gaussian noise. The detector's noise, however, is not Gaussian 
over a long time scale, so that -Pi(x) is also, strictly speaking, unknown to us. This is why we have used software 
injections. As a bonus we also derived a threshold on x 2 , rj(N,a), which should be used in the analysis of the full 
data set. 

As one can see, this method can be applied to any signal based vetoing statistic. In Section IIVI we will apply 
this method to determine the efficiency of other statistics. As an example, we can apply Eq. @ to the simulation 
described above and quantify the results presented in Fig. ^ 

TABLE I: 

Optimization of \ 2 ■ Detection probability and threshold on \ 2 f° r various number of bins. False alarm probability in all cases 



N bins 


8 


16 


24 


32 


40 


50 


64 


86 


Pd 


58% 


81.2% 


85.4% 


82% 


75.8% 


68.3% 


62.4% 


47% 


threshold 


1.59 


8.985 


20.8 


34.54 


47.46 


65.97 


95.11 


143 




40 60 
Number of bins 



FIG. 2: Graphical representation of first two lines from the Table The solid line is a cubic spline interpolation. 



The results given in Table |U (especially Pj) should be taken with caution. We have injected only 214 signals, and 
it might not be enough to make a definite statement. However it is a very good indication on what is the optimal 
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number of bins. We have quite a large number (2550) of spurious events with SNR > 6, so that the statement about 
the threshold for a given false alarm probability is pretty solid. It should also be mentioned that we have truncated a 
tail of the \ 2 distribution for spurious events by neglecting 5% of all events with largest x 2 (we continue 5% truncation 
for false alarm distribution in the Section Hvl as well). We have also performed a cubic spline interpolation between 
these points (see Fig. [2J to show that the optimal number of bins indeed lies somewhere close to 20. 

At the end of this Section we would like to mention that an optimal parameter might not exist, or it could be not 
the obvious one. 



IV. OTHER SIGNAL BASED VETO STATISTICS 



In this Section we will consider other signal based veto statistics. We start with a statistic that was inspired by 
the Kolmogorov-Smirnov "goodness-of-fit" test. We will show its statistical properties in the case of Gaussian noise. 
Then, we will consider some possible modifications of that statistic and another x 2 -like statistic, which we will call 
f 2 . We show their performance using GEO 600 SI data. 



A. Kolmogorov-Smirnov based statistic 

The original Kolmogorov-Smirnov "goodness-of-fit" test pH I2H, |24| compares two cumulative probability distribu- 
tions, S(x), P(x), (see Figure|3l, and the test statistic is the maximum distance D between curves S(x) and P(x). 





FIG. 3: Schematic representation of Kolmogorov-Smirnov test. Comparison between two cumulative distributions, P(x) is a 
theoretical distribution and S(x) is an observed one. Kolmogorov-Smirnov statistic is D. 

Here we suggest a vetoing statistic which is somewhat similar to the Kolmogorov-Smirnov one, or better to say that 
the new statistic was inspired by the Kolmogorov-Smirnov test. We start by defining a few more quantities: 



& = 2 £ s{f ^f {fk) Af; i = l,...,M, F M = f lso , Vm = 1. (9) 
fk=Fo bn ^ k > 

( £{fk)S*(f k ) \ A , n x(f k )s*(f k ) A t , . 

ft = }. \ — ofn — + c - c ' qM = V k = — g it \ — A f ' ( 10 ) 

Jk=ro 

where M is defined by the frequency resolution. 

The main idea is to compare two cumulative functions: the cumulative signal power within the signal's frequency 
band and the cumulative SNR, which is essentially the correlation between the detector output and a template within 
the same frequency band. Introduce the vetoing statistic according to 



d = maxi \qi — 4>iQ\ j i = 1> ■■■7 M — 1 



(11) 
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and let us call it <i-statistic. However, we have found that, in practice, another statistic, d: 



d — maxi 



Q 



performs better. Nevertheless we start with d-statistic and postpone consideration of d to the next subsection. The 
main question which we want to address is what is the probability of d > D in the presence of a true chirp in 
Gaussian noise. Although we know that the detector's noise is not Gaussian, we can treat it as Gaussian noise plus 
non-stationarities (spurious transient events), and we try to discriminate those non-stationarities from the genuine 
gravitational wave signals. We refer the reader to Appendix El for detailed calculations and we quote here only 
the final results. If we introduce Yi = q t — ipiQ (so that d = ma:Ej|Y^|), then the probability distribution function 
P(Yi, Ym-i) is the multivariate Gaussian probability distribution function and 



Pr(d >D) = l- 



dYL.AYM-i 



■. exp 



YC -1 Y T 



(12) 



where the covariance matrix, C, is defined in Eq. i|A9|l . 

To show the performance of the d-test, we have computed d for a glitch that produced SNR=16 at the output of the 
matched filtering and for the simulated chirp added to the data. The result is presented in the Fig.^J The upper two 
panels show qf. the top graph is plotted for a true chirp, and the middle graph is for a spurious event. The dashed 
line corresponds to the expected cumulative SNR (ipiQ) and the solid line is the actual accumulation (qi). The lower 
panel shows the distance (\qi — ipiQ\) as a function of frequency. The solid line here corresponds to the injected signal 
and the dashed line is for a spurious event. 




200 300 
frequency (Hz) 



FIG. 4: Performance of d-test. Comparison of the cumulative SNR versus expected (solid and dashed line correspondingly) 
for injected chirp (the top graph) and for a spurious event (middle graph). The bottom plot shows distance d as a function of 
frequency (the solid line is for injected chirp and the dashed line is for spurious event). 



As one can see, this test works in practice. However we have found that the <i-statistic, defined above, performs 
better. One reason for this is that for the loud gravitational wave signals, we might have large d due to slight mismatch 
in parameters caused by the coarseness of the template bank. 



B. Other vetoing statistics 



We start with another x 2 dike discriminator. The suggested statistic is 
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The interesting fact is that the TAMA group [8j is using a similar (related to the inverse of this quantity) statistic 
for the purpose of detection. In the following consideration we will omit the number of bins N as it is just an overall 
scaling factor which does not affect vetoing. One can see that % 2 introduced in Eq. (0J is related to the new statistic 
according to x 2 = Q 2 f 2 . It is possible to derive the probability distribution function for AQi = Qi/Q—Pi for Gaussian 
noise following the same line as described in Appendix [X] Unfortunately, the expression is quite messy, especially for 
the large number of bins N and it is not very useful in practice. To check the performance of this statistic we have 
conducted simulations similar to the ones described in Section lTTTl Namely, we have injected a chirp signal into a day's 
worth of SI GEO 600 data and plotted the two f 2 distributions in the upper half of Fig. [5J The shaded histogram in 
the upper plot is a distribution of f 2 for spurious events with SNR > 9 and the solid line curve is a distribution of 
f 2 for injected chirp signals. We have chosen 20 bins to compute f 2 . Applying the scheme defined in the Section ITm 
we find that the detection probability is 95.9% and threshold is 16.47 for a false alarm probability of 1%. Note that 
we did not use playground data for these simulations, so that our result might be biased by the choice of a particular 
data set. 

Next, we will modify d-statistic according to 



d = maxi 



7T ~ Wi 



(14) 



Define Yi = qi/Q — ip. We will skip the derivation of the probability distribution function P(Y\, ...,Ym-i) in Gaussian 
noise. As in the case of the f 2 statistic, the probability could not be expressed in the nice close form, and, therefore, 
is not useful in practical applications. The performance of d statistic is also shown in the Fig. (lower graph). To 
produce this picture we have used the same simulation as for r. The detection probability for the <i-test is 94.3% and 
the threshold is 0.21 for a false alarm probability of 1%. 
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FIG. 5: Performance of f 2 and d vetoing statistics are presented on the upper and lower plot correspondingly. The shaded 
histogram corresponds to spurious events, and the solid line histogram is distribution of vetoing statistic for injected signals. 
We have used one day's worth of SI GEO 600 data to conduct these simulations. 



Another possible modification of the d-statistic is choosing not the largest distance, but the percentile value, in 
other words, the maximum distance after throwing away, say, 3% of the largest distances. The percentile value could 
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be considered as a parameter for the (i-statistic, and could be optimized for. To finish with c?-like statistic, let us give 
a few other possibilities: 



d* 



V = d + + d- = maxi 



qi/Q - ipj 

q 



ipi + maXi ( tjj. 



(15) 
(16) 



The first one, defined by Eq. 11511 . i s the analogue of Anderson-Darling |25| statistic and the second one, Eq. (|16|l . is 
the analogue of Kuiper statistic |2(j . 

The interesting fact is that the product of statistics dx f 2 works even better than each of them separately and one 
can see this in the FigureEl The detection probability in this case is 98.3% and the threshold is 4.7 for a false alarm 
probability of 1%. 
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FIG. 6: Distribution of product statistic (d x f 2 ) for injected signals (solid line histogram) and for spurious events (shaded 
histogram). We have used the same day-long GEO 600 data as for producing results presented in the Fig. [S] 



The reason that the product of two statistics works even better than each of them separately could be because 
r and d might be better suited for different types of spurious events, and equally good for the true signals. The 
statistics in the product supplement each other to veto larger number of spurious events. 

We have tried to optimize dx f 2 with respect to the number of bins, following the same line and conducting similar 
simulations as described in the Section Ifffl However, we have not found the obvious choice for the optimal number of 
bins. This is because the detection probability as a function of the number of bins for dx f 2 fluctuates slightly about 
a constant value for the number of bins between 18 and 40. 



V. CONCLUSION 



In this paper we have considered several signal based vetoes. Those are various statistics based on our knowledge 
of the signal we search for, which help us in discriminating genuine gravitational wave signal from spurious events of 
instrumental or environmental origin. 

We have outlined the method to optimize x 2 -like statistic for the number of bins. This method is based on adding 
simulated signals to real data and studying the distribution of the vetoing statistic for injected signals and spurious 
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events. The optimal number of bins is the one which maximizes the detection probability for a fixed false alarm 
probablity. This method also automatically provides us with the vetoing threshold. 

We have considered two other very promising signal based vetoes: f 2 - the x 2 -like discriminator, and d - the 
statistic which was inspired by the Kolmogorov-Smirnov "goodness-of-fit" test. Using again simulated injections into 
GEO 600 SI data we have shown that both those statistics could give a very high detection probability (> 94%) for a 
given false alarm probability (1%). We have also pointed out that we can achieve even better performance if we take 
the product of the two statistics as a new veto. 

Finally, let us emphasize, that the results of the simulations presented here are data dependent, and the exact 
numbers for efficiency may vary for different detectors and/or for different data sets of the same detector. However, 
as it follows from the analytical evaluations and indicated from the conducted simulations, we should expect good 
performance for all signal based vetoes considered in this paper. 
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APPENDIX A: DERIVATION OF STATISTICAL PROPERTIES OF d-TEST. 

This Appendix is dedicated to deriving the probability that the d-statistic, introduced in Ijllfl. is larger than a 
chosen value D. The derivations presented here are conducted along the line similar to the one described in Appendix 
Aof@. 

We assume that the detector's noise n(U) is Gaussian. Introduce Yi = qi — ipiQ, then d = maXi\Yi\. The main 
question we want to address is what is the probability of d > D in the presence of a true chirp: 

Pr(d > D) = Pr{maxi{\Yi\} > D) = 1 - Pr(maxi{\Yi\} < D) = 1 - Pr{\Y x \ < D, |Y M -i| < D) 

/D j-D 
... P(Y 1 ,...,Y M -i)dY 1 ...dY M _ 1 . (Al) 
-D J-D 

We need to find the probability distribution P(Y X , Ym-i) and we start with statistical properties of y k : 

We know that yk are M independent Gaussian random variables. We can find their mean and variance, 

E{y k ) = 2A 3{f " )s * { { k) = A{^ k - Vfc-x) = Afa, (A2) 

var(yk) — 4>k, where we used notation <fc = 2 ^ ^ *! ^ k A/. (A3) 
Taking into account the fact that y k are independent and have normal distribution, J\f{A<j) kl (j) k ) 1 we can write 



P(y 1 ,...,y M ) = f[- 7 ± 



(yi-Afaf 



We use the same trick as in Q : 



exp 



24>i 



(A4) 



/ 

/ dy 1 ...dy M P(y u y M )F I y x - ipi ^ yk, ^ Vk ~ i>M-i ^2 yk 

** \ i i j i i„ i 



dx\...dx M -\ P(x 1 ,...,x M -i)F(xi,...,x M -i) = (A5) 

M M-l M 



fe=l fe=l fe=l 
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and choose F(xi, ...,xm-i) — ${zi — Yi)— S(xm-i — Ym-i)- This yields 

P(Y 1 , ...,Y M -i) = J dy 1 ...dy M P(y 1 , ...,y M )6 fyi - ipi y k - Yij 

(M-l M \ 

^2 Vk - 1pM-l ^Uk- Ym-1 
k=l k=l J 

Under the following change of variables of integration (y±, y n ) — » (zi, zm-x, W) 



M 

yi = zi + faW; W = y^Vk 

fc=i 

yi = Zi - Zi-i + (j>iW, i = 2, M - 1 
y M = -zm-i + 4>mW 

j = det J iyi '-> yM l„ = f> = l, 



d(z-L,...,z M -i,W) 



k=l 



the integral (|A6|) takes the form 



(A6) 



P(Y U ...,Y M - 



dz\...dzM-idW 



M 



exp 



{yi - A<j>j) 2 
2<f)i 



8{zx - Y x )...6{zm-x - Y M -i). (A7) 



The argument of the exponent can be expressed in term of new variables according to 



M 

E 

i=l 



{Vi - Hi 



M 



Y {Zl Zt - l)2 +(W- A) 2 = ZC X Z T + (W- A) 2 , 

— J eh: 



(A8) 



where in the expression above we used zq — Zm — 0, Z is a vector column [z\, Zm-i), and C 1 is inverse of the 



covariance matrix, C, 



c-\ 



1 1 



IJ i+l J <Pj 



1 A 1 A 

oj+i.7 — r°ij+l- 



(A9) 



Note that 



det(Cij) 



M 



Taking all above into account and performing integration over W we arrive at the required probability distribution 
function 



P(Y U ...,Y 



Af-XJ 



(2tt)( m - 1 )/ 2 V^(^ 



: exp 



YC- 1 Y T 



(A10) 



which is the multivariate Gaussian probability distribution function. The final result can be written as 



Pr(d > D) = 1 - [° — 



dY x ...dY M -x 



exp 



- D (2ir)(. M -V/ 2 ^/det(C tJ ) 

One also can compute mean and variance for each Y{i 

E(Yi) = 0, 
cov^jiYiYj) = ipi(l-ipj). 



YC- 1 Y T 



(All) 



(A12) 
(A13) 
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We would like to emphasize, that like in the case or \ 2 discriminator, Y{ and, correspondingly d, do not depend on 
the signal amplitude A. 
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